Heterogenous Uncertainty Sampling for Supervised Learning

نویسندگان

David D. Lewis

Jason Catlett

چکیده

Uncertainty sampling methods iteratively request class labels for training instances whose classes are uncertain despite the previous labeled instances. These methods can greatly reduce the number of instances that an expert need label. One problem with this approach is that the classifier best suited for an application may be too expensive to train or use during the selection of instances. We test the use of one classifier (a highly efficient probabilistic one) to select examples for training another (the C4.5 rule induction program). Despite being chosen by this heterogeneous approach, the uncertainty samples yielded classifiers with lower error rates than random samples ten times larger.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Paired Sampling in Density-Sensitive Active Learning

Active learning consists of principled on-line sampling over unlabeled data to optimize supervised learning rates as a function of the number of labels requested from an external oracle. A new sampling technique for active learning is developed based on two key principles: 1) Balanced sampling on both sides of the decision boundary is more effective than sampling one side disproportionately, an...

متن کامل

Active Learning for Unbalanced Data in the Challenge with Multiple Models and Biasing

The common uncertain sampling approach searches for the most uncertain samples closest to the decision boundary for a classification task. However, we might fail to find the uncertain samples when we have a poor probabilistic model. In this work, we develop an active learning strategy called “Uncertainty Sampling with Biasing Consensus” (USBC) which predicts the unbalanced data by multi-model c...

متن کامل

Heterogeneous Uncertainty Sampling for Supervised Learning

متن کامل

Uncertainty Quantification in the Classification of High Dimensional Data

Classification of high dimensional data finds wide-ranging applications. In many of these applications equipping the resulting classification with a measure of uncertainty may be as important as the classification itself. In this paper we introduce, develop algorithms for, and investigate the properties of, a variety of Bayesian models for the task of binary classification; via the posterior di...

متن کامل

Active Learning-Based Elicitation for Semi-Supervised Word Alignment

Semi-supervised word alignment aims to improve the accuracy of automatic word alignment by incorporating full or partial manual alignments. Motivated by standard active learning query sampling frameworks like uncertainty-, marginand query-by-committee sampling we propose multiple query strategies for the alignment link selection task. Our experiments show that by active selection of uncertain a...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1994

Heterogenous Uncertainty Sampling for Supervised Learning

نویسندگان

چکیده

منابع مشابه

Paired Sampling in Density-Sensitive Active Learning

Active Learning for Unbalanced Data in the Challenge with Multiple Models and Biasing

Heterogeneous Uncertainty Sampling for Supervised Learning

Uncertainty Quantification in the Classification of High Dimensional Data

Active Learning-Based Elicitation for Semi-Supervised Word Alignment

عنوان ژورنال:

اشتراک گذاری